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We consider the nonparametric estimation of a periodic function 
that is observed in additive Gaussian white noise after convolution 
with a “boxcar,” the indicator function of an interval. This is an ide¬ 
alized model for the problem of recovery of noisy signals and images 
observed with “motion blur.” If the length of the boxcar is rational, 
then certain frequencies are irretreviably lost in the periodic model. 

We consider the rate of convergence of estimators when the length 
of the boxcar is irrational, using classical results on approximation 
of irrationals by continued fractions. A basic question of interest is 
whether the minimax rate of convergence is slower than for nonperi¬ 
odic problems with l//-like convolution filters. The answer turns out 
to depend on the type and smoothness of functions being estimated 
in a manner not seen with “homogeneous” filters. 

1. Introduction. 

1.1. Statement of problem and motivation. Suppose that we observe y(t) 
for t G [—1,1], where Y is drawn from an indirect estimation model in Gaus¬ 
sian white noise: 

(1) Y{t) = j'_^Kaf{s)ds + eW{t), 
where 

(2) Kafit) = ^ [ f{t-u)du, a>0, 

2o J-a 
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& [—1,1]} is a standard two-sided Wiener process and e is small and 
assumed known. It is desired to estimate the unknown signal /, assumed to 
be periodic on [—1,1], We refer to this as boxcar deconvolution, because 
Kaf = f * ka corresponds to convolution with the step function ka{t) = 
{2a)~^I{\t\ < a}. 

The problem has the peculiar feature that if the boxcar half-width a 
is rational, then certain frequencies are completely unrecoverable from the 
data. Indeed, because of the periodic and convolution structure, the problem 
is diagonalized in the Fourier basis. Thus, let efc(f) = for integer A: G Z. 
Then KaCk = r^Ck, where the eigenvalues tq = 1 and 


( 3 ) 


sin7rA:a 


fc/O. 


Furthermore, setting Vk = j\ek{t) dY{t), Ok = {f,ek) := f{t)ekit) dt, and 
Zk = j\ek{t) dW{t), we hnd that model (1) is equivalent to 


( 4 ) 


yk = rkOk + ezk, k€l. 


For rational a = plq, the eigenvalues vanish for all integer multiples 
k = jq of q. In the Fourier expansion J2{fj^k)ek, all information about the 
coefficients {f,ejq) is lost after convolution. For irrational a, however, the 
inversion formula 

(5) {f,ek) = —{Kaf,ek) 

rk 

is at least well defined, since Vk^O for any A: G Z. The object of this paper 
is to study the quality of estimation of / attainable for irrational a in the 
small noise limit e —> 0. 

Motivation for studying this special problem arises from several sources: 


(i) It may be viewed as an idealization of the problem of recovery from 
linear motion blur plus noise in a fixed field of view. If a camera is passing 
over a scene f{x,y) along a direction (I,r) at unit speed, then in exposure 
time 2a the image acquired at point {x,y) may be modeled as 

I /■“ 

(6) Kf{x,y) = — f{x + u,y + ru)du. 

2a J-a 

Our model is a one-dimensional version of horizontal motion, r = 0. While 
the periodicity assumption on / may seem artificial, it does capture the 
property that if / is locally periodic with period 2a near {x,y) (as in certain 
textures), then Kf is locally constant near {x,y). Compare the discussion 
in Section 5.1. A more detailed discussion of linear motion blur, with pho¬ 
tographic examples, may be found in Bertero and Boccacci [(1998), pages 
54-58]. 
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(ii) It is related to the problem of periodic density estimation with uni¬ 
form errors. Suppose Xi,..., Xn are i.i.d. random variables with unknown 
periodic density / on the circle T. However, the Xi are not observed; instead 
we see jittered versions 


Xi — Xi + Zi 


where {zi} are i.i.d. uniformly distributed on [—a, a] and circular addition 
is used. 

(iii) As an inverse problem, (5) is nonstandard: the eigenvalues oscillate 
inside an envelope decaying like 1 /frequency^ for A: / 0, 

rk<c/\k\, c=(7ra)“^. 


We may ask the following: is the quality of estimation—measured by 
minimax rate of convergence as e—> 0—determined by the 1/|A:| decay, or is 
it affected by the oscillatory behavior? 

(iv) Let ||x|| denote the distance from x G M to the nearest integer. For 
A; / 0, 


(7) 


2 ||A:a|| ||A:a|| 

vr |A:a| — ^ ’ 


and so the oscillations in (3) are driven by 


( 8 ) 


||A:a|| := inf{|A:a — /|, / G Z}. 


The study of such “Diophantine approximations” uses the classical theory 
of continued fractions, for example, Lang (1966) and Khinchin (1992), and 
plays a basic role in this paper. 

There is a large literature on statistical inverse problems—for some re¬ 
cent reviews see Tenorio (2001) and Evans and Stark (2002). In partic¬ 
ular, the sequence space formulation studied here has received substan¬ 
tial attention: a sample of recent works, in addition to those cited be¬ 
low, include Wahba (1990), Johnstone and Silverman (1990), Koo (1993), 

Belitser and Levit (1995), Donoho (1995), Mair and Ruymgaart (1996), Golubev and Khas’minskh 
(1999, 2001) and Cavalier, Golubev, Picard and Tsybakov (2002). However, 
much of this literature is concerned with eigenvalue sequences having (up to 
constants) monotonic behavior as k increases. Papers that do specifically ad¬ 
dress the boxcar deconvolution problem include Hall, Ruymgaart, van Gaans and van Rooij 
(2001), Groeneboom and Jongbloed (2003) and O’Sullivan and Roy Choudhury 
(2001); see Section 5.1 for some further discussion. 
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1.2. Effective degree of ill-posedness. Problem (1) is an example of a 
linear statistical inverse problem in which one observes a noisy version of Kf 
for some linear operator K, and wishes to reconstruct /. Such linear inverse 
problems are typically ill-posed in the sense of Hadamard: the inversion does 
not depend continuously on the observed data. One manifestation of this is 
that rates of convergence of estimators as e —> 0 are slower than in the direct 
case in which / itself is observed with noise. We shall formulate some well- 
known existing results in terms of a notion of “degree of ill-posedness” (DIP) 
in order more easily to state the results of the present paper. 

Under appropriate conditions, K will have a singular value decomposi¬ 
tion, and in terms of coefficients in the singular system expansions, the 
observations may be written in a sequence form 

(9) yk = rkOk + ezk, k£Z, 
or, equivalently, after dividing through by rk, as 

( 10 ) Uk — dk E ^kZkj 

where fjk = yk/f'k and = e/r^. Let \\ 6\\2 = Define the (nonlinear) 

minimax risk of estimation with respect to a parameter space & G £2 via 

(11) RN{Q,e) = inf sup E\\9-9\\l, 

e 0 G 0 

where the infimum is taken over all (measurable) functions 9 of the data. 
We define the linear minimax risk by 

i?i,( 0 ,e) = infsupL;|| 0 - 6 »|| 2 , 

Bl 0G0 

where attention is restricted to the subclass of linear estimators 9l = (9^) 
with 9jf = CkPk, for some sequence (cfc). 

Parameter spaces of primary interest in this paper include, for cr > 0, C > 
0 , hyperrectangles 

(12) = {9 : \9k\ < , A: / 0, and 9o E M} 

and ellipsoids 

(13) 0 ^(C') = | 0 :^fe 2 - 02<^2 

I k 

Remark 1. Within these scales of spaces, the parameter a measures 
smoothness: larger a corresponds to faster decay of coefficients. When the 
{9k) are Fourier coefficients, the ellipsoids correspond exactly to mean-square 
smoothness of the a derivatives of / = X) [See, e.g., Kress (1999), Chap¬ 
ter 8.1.] There is no such simple characterization for hyperrectangles—the 
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definition (12) is chosen to yield the same rates of convergence as (13) in 
the homogeneous cases described next. The parameter C measures size: it 
corresponds to the radius of balls within these spaces. 

Remark 2. In (5) we used the complex exponentials The model 
has the same form if instead one uses the real trigonometric basis efc(t) = 
cosirkt or sinirkt or l /\/2 according as A: > 0, fc < 0 or A: = 0. Model (9)-(10) 
applies to indices A: G Z. For convenience in the rest of the paper, we restrict 
the index k to N+ = {1,2,... }. Indeed, since spaces such as (12) and (13) are 
symmetric with respect to ±A:, we have i2Ar(0,e;Z) = 2 i? 7 v( 0 , e;N+) + e^, 
with the analogous statement valid also for the linear minimax risks. Con¬ 
sequently, rates of convergence are certainly unaffected by working on N+. 

Remark 3. The notation a(e) x 6 (e) means that there exist constants 
such that for sufficiently small e, ci 6 (e) < a(e) < C 26 (e). The constants ci,C 2 
and other generic constants (denoted by c and not necessarily the same 
at each appearance) may depend on parameters of the smoothness class 0 
such as fj, but they do not depend on e, 6 or the size parameter C. While the 
size constant C clearly does not affect the rate of convergence as e —> 0, we 
consider it useful to show the order of dependence of minimax risks on C. 
The notation ~ 6 ^ means that limfc_,_|_oo(afc/ 6 fc) = 1. The notation c 
means that, for all k, ak = c. 


Suppose that the eigenvalues satisfy a homogeneous decay condition ~ 
\k\~^ and that 0 = H^{C) or 0 = Q^iC). Then it is well known [e.g., 
Korostelev and Tsybakov (1993), Chapter 9] that 


(14) 




a 

c 1 /2 “L oi 


For direct data we have = 1 in (9) and it is known that 




SD 


a 

a + l/2' 


This motivates the following definition of effective DIP: 

(15) a{K^Q)\=a( -V 

Vs sdJ 


For indirect problems a{K, 0) gives a measure of the effect (on the conver¬ 
gence rate) due to the inversion process. For example, if K is an a-fractional 
integration operator and 0 = 02 ( 0 ), then ~ \k\~°‘ and so, in this case, 
a{K, 0) = a. As a gets larger it becomes more and more difficult to re¬ 
cover /. 
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Fig. 1. An illustration of the degree of the DIP for the boxcar deconvolution operator 
With a — 2/{-\/E + 1). Using a log-scale along the vertical axis, the function k —>■ is 
depicted for k = 0,1,2,, 500 (oscillating solid line). For comparison purpose we also de¬ 
pict k —> for a homogeneous operator with DIP= 1,1.5,2 taking eigenvalues Vk = ck~°‘, 

where a=l, 1.5,2 and c = 0.58 (smooth dashed curves). 


Returning to boxcar deconvolution, we note that ~ corresponds 
to an effective DIP of a = 1. The question studied in this paper is whether 
the oscillations in of (3) increase the DIP. Compare Figure 1. 

The answer turns out to depend on the function class. The main results, 
Theorems 1 and 2, can be expressed as saying, so long as logarithmic terms 
are ignored, that for ellipsoids and almost all irrational a, 


a{Ka, © 2 ) = I for all u > 0, 


while for hyperrectangles, 


a{Ka,H^) 




if 0 < cr < |, 
2 “ 


(16) 


2c7 + 1 
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Thus, the DIP of boxcar deconvolution lies between 1 and and is bet¬ 
ter (i.e., smaller) for more uniform smoothness (hyperrectangles) and for 
smaller a. 

Remark 4. We caution that the literature contains other definitions of 
DIP of an inverse problem: for example, in Mathe and Pereverzev (2001), it 
refers to a numerical index of distance from invertibility. While these notions 
are certainly related, the definition used here is simply a convenience for 
interpreting results stated formally in Sections 3 and 4: it refers to the drop 
in rate of convergence due to presence of the decaying eigenvalues 

Remark 5. There is an elbow in rates at cr = | for hyperrectangles but 
not ellipsoids. This contrasts with results obtained for homogeneous oper¬ 
ators (14). Observe that the rates of convergence are worse for ellipsoids 
than for corresponding hyperrectangles; this occurs because the uniform hy¬ 
perrectangle constraint (12) operates on each coordinate and so provides 
less scope for maximizing risk by concentrating signal energy in coordinates 
where ||A:o|| is small than does the ellipsoid case where only a total energy 
constraint (13) applies. 

2. Preliminaries. 

2.1. Diophantine approximations. We recall some pertinent parts of the 
classical theory, referring to Lang (1966) and Khinchin (1992) for further 
details. The study of approximations such as (8) is connected to the approx¬ 
imation of irrationals by rationals known as Diophantine approximations. 
For a given irrational number a, we distinguish the systematic approxima¬ 
tions ||A:a||, k = 1,2,... of (8) from the best rational approximations p/q: by 
6est-approximation we mean that 

(17) \qa—p\< min ||A:a||. 

l<k<q 

Given the sequence of solutions {pn,Qn) to (17), the rate of approximation 
is defined in terms of the decay of 

(18) D{a,qn) = 

Apart from the two basic groups of real numbers, rationals and irrationals, 
there exists a much finer division of irrational numbers based upon the de¬ 
gree to which they can be approximated by rational fractions. This may 
range from 0(l/g^) to arbitrarily much faster, as explained below. These 
rates depend crucially on the best-possible rational approximation (17). The 
solution of (17) is given by the continued fractions algorithm which, unlike 
systematic fractions (||fca||/A:, k = 1,2,...), captures the arithmetic proper¬ 
ties of the number to be approximated. 
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2.2. Continued fractions and convergents. Any real number a that is not 
an integer may be uniquely determined by its continued fraction expansion 


(19) 


a = oo H-= [ao; ai, 02 ,...] 



where oq is an integer and oi, 02 ,... is an infinite sequence of strictly positive 
integers. In the algorithm (19) the numbers Ok are called the elements or 
partial denominators. To each infinite sequence (a^) corresponds a unique 
irrational number a and vice versa. At stage n the algorithm uses only the 
first n- elements: [oq ; oi, 02 j • • • j On] • For such a terminating continued fraction 
only a finite number of operations are involved and the result is clearly a 
rational number: 

(20) Oo H-i-= [ao;ai, 02 ,... ,an] = —. 

oi H - - — Qn 



The rational numbers (pn/qn), re = 0,1,... are called the convergents of o. 
Returning to the problem of approximating an irrational number o by ra¬ 
tional, we have that, for re > 1, 


( 21 ) 


inf \\ka 

l<k<qn 


qna-pn\ = \\qna\\. 


In words, the convergents satisfy the best-approximation property (17). In¬ 
deed, any best-approximation is a convergent since, for re > 1, qn is the 
smallest integer q > qn-i such that ||( 7 o|| < ||(/n-ia|| [see, e.g., Lang (1966), 
page 9]. The quality of best-approximation is given by 


[Lang (1966), page 8]. While for systematic approximation, with 1 < A; < 
Lang [(1966), page 10] shows that 



(23) 


It is informative to note that, for re > 2, the algorithm (20) can be written 
as 


(24) 


qn — Oinqn—1 T qn—2i Pn — 0,nPn—l T Pn—2 


from which follow some basic properties of the convergents of all irrational 
numbers a: 

(i) The denominators qn grow at least geometrically: 


(25) 



i > 1. 
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(ii) For all n > 0, 


Cln < 


Qn 

Qn-l 


<0^ + 1. 


The qualitative nature of rational approximations can, therefore, be mea¬ 
sured by the size of the elements in the continued fraction algorithm, from 
( 22 ), 


(26) 


1 

2ql{an+i + 1 ) 


< T)(q., Qn) < 


1 

Qn^n+1 


Faster approximation will occur for those irrationals with larger elements 
an and vice versa. Families of irrational numbers can be defined according 
to the size of their elements. 


Definition 1. We say that an irrational number a is badly approx- 
imable (BA) if 


sup On (a) < oo. 

n 

From (26), we see that arbitrarily fast rates of approximation are possible. 

A natural question arises—are there general laws which govern the ap¬ 
proximations of classical irrational numbers?—Again, some answers follow 
from the continued fraction algorithm [Khinchin (1992), Chapter II]. One 
class of results concerns algebraic numbers—roots of polynomials with in¬ 
teger coefficients. For example, it can be shown that quadratic irrationals 
(such as \/5) have periodic elements and so are BA. And cubic irrationals 
(e.g., 5^/^) cannot be approximated with a rate faster than 1/q^. 

Another class of results constitutes the “measure theory” of continued 
fractions. For example, almost all numbers (i.e., except a set of Lebesgue 
measure zero) have unbounded On [Khinchin (1992), Theorem 30]. On the 
other hand, for almost all numbers, it is also true that the rate of approxi¬ 
mation can be no faster than 0(l/g^(log(/n)^^^), 5 > 0. For us, an important 
consequence (see the Appendix) is the following. For each <5 > 0, there is a 
set As of full measure such that 

(27) Q'n+I > Q'n log g'n infinitely often, 
and yet 

(28) Qn+i < Qni^ogQn)^^^ for all large n > n{a). 

Henceforth, the usage “almost all a” means “for all a in A^.” 





10 


I. M. JOHNSTONE AND M. RAIMONDO 


2.3. Minimax risk. We recall some basic results, established for the di¬ 
rect data setting = 1 (or = e) in Donoho, Liu and MacGibbon (1990), 
and easily extended to the indirect setting (10) (see the Appendix). If 0 is 
compact, orthosymmetric and quadratically convex, then 

(29) i?Ar(0,e) < i?L(0,e) <//*i?Ar(0,e), 

where /r* < 1.25 is the Ibragimov-Khasminskii constant; see Donoho, Liu and MacGibbon 
(1990). For such sets, we also have 

\Rp{Q,e)<RL{Q,e)<Rp{Q,e), 

where we define 

(30) i?p(0,e) = sup^6'| Ae|. 

In the light of bounds (7) and Remark 2, our task is, then, to evaluate 
Rp{Q,e) for selected 0, small e and k G N+, for the boxcar operator, which 
has 

/ N efc TT ek , 

2.4. An equidistribution lemma. While precise bounds (22) are available 
for best-possible rational approximations to an irrational number a, the 
quality of systematic rational approximations ||A:a||, A; = 1 , 2 ,..., changes 
considerably as k varies. As a result, and oscillate widely as k changes; 
see Figure I. However, the average behavior is much less susceptible to 
fluctuations. Indeed, as k runs over a block of length q, the values of ||A:a|| 
have a distribution that is in certain respects close to discrete uniform on 

q-\2q-\...A- 

Lemma 1. Letp/q andp'/q' be suceessive prineipal eonvergents in the 
continued fraetion expansion of a real number a. Let N be a positive integer 
with N + q < q'. Let h be a nonincreasing function. Then we have upper and 
lower bounds 

q N+q q-3 

(32) £/i(///g)< Y. H\\ka\\)<2Yh{h/Q) + ^H^/{‘^Q'))- 

/^=4 k=N+l /i=l 

Proof. The argument is a modification of that used by [Lang (1966), 
page 37]. Since p/q is a principal convergent, we may write a in the form 
a = p/q + b/q^ with jhj < 1. Writing k = Rl with z/ = 1,..., g, one gets 

ey\ < l/q. 


ka = Na + vp/q + e^, 
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Since p and q are relatively prime, the sets {i^p/q, ly = 1,... ,q} and {p/q, p = 
0,... — 1} are equal modulo Z. To each k there is associated a unique v 

and, hence, /i, and setting = Na + p/q, we have 

ka = (mod Z). 

The points {x^, p = l,... ,q} form an equispaced set with exactly one point 
in each interval = [{p — l)/q,p/q). 

Let R{^) = C ~ [?] denote the remainder of a real number Consider first 
the set /Cl of indices k for which the corresponding points lie in Iq U 
Ii U Iq-i'. clearly, |/Ci| = 3. Since k < q', we have from the remark following 
(22) that R{ka) > ||A;a|| > 1/(2^'). Hence, the sum of h{R{ka)), for k G /Ci, 
is bounded by 3h{l/{2q')). 

Let JC 2 be the set of remaining indices k in {+ 1,..., N + q}, so that the 
corresponding points lie in /2 U • • • U Iq- 2 - Since all |e^| < l/g, each of the 
left endpoints of Ii,..., Iq -3 is a lower bound for exactly one R{ka), k G IC 2 
and the right endoints of /a,..., Iq-i each are upper bounds for exactly one 
R{ka). 

Combining this with the upper bound for /Ci, we obtain 

q N+q g-3 

(33) '^h{p/q)< /i(i?(A:a)) < ^/i(/i/g)+3/i(l/(2g'))- 

lt=4 k=N+l fJ.=l 

This inequality remains valid if we replace h[R{ka)) by h{l — R{ka ))— 

indeed, the proof is simply “reflected about and we note that for k 
in the (reflected) /Ci, we have 1 — R{ka) > ||A:a|| > l/{ 2 q'). Since ||x|| = 
min{i2(x), 1 — R{x)}, we have 

/i(||x||) = max{h{R{x)), h{l — R{x))}, 

and using (a + 6)/2 < max{a, b} <a + b, the lemma follows from (33) applied 
to R{ka) and 1 — R{ka). □ 

Remark 6. The proof shows that the upper bound continues to hold 
if the middle sum is taken over A^ + l<A:<A^ + A;o, where ko <q and we 
assume only N + ko < q'. 

Remark 7. The bounds provided by this lemma are often sharp up to 
constants. For example, if a is BA and h{x) = 1/x, 

N+q 

Y \\ka\Y+iqlogq. 
k=N+l 


3. Hyperrectangles. 
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3.1. Statement and outline. To state the main results, introduce two rate 
constants 

r = + + f = a/(a + |), 

and note that r < f if and only if a > 3/2. More precise results are possible 
in the BA case, while for generic irrationals, the consequences (27) and (28) 
of Khinchin’s theorem lead to only slightly weaker statements. 


Theorem 1. For BA a we have 

{ (J2{l-r)^2r^ 

Celog{c/e), ifa = l’, 

(^2(l-f)g2f, i/0<CJ<|. 

For almost all a, the previous bounds remain valid for 0 < a < |, while for 
O' > for each <5 > 0, 


(35) RN{H^{C),e) 


<C2(logC/e)'5+^C2d-^)e2^ 

>ci(logC/e)2'-C2(i-^)e2'- 


for all small e, 
for infinitely many e. 


There is thus an “elbow” in the rates of convergence at cr = |. Comparison 
with (14) shows that for cr < |, the DIP is a = 1 (as if the sinusoidal term 
were not present in r^). However, for cr > |, the DIP given by (16) increases 
gradually from 1 to a limiting value of | for large cr. 

This result does not cover irrationals with fast rates of approximation 
(e.g., 1/g^ or higher, as discussed in Section 2.2), but, of course, such num¬ 
bers form a set of Lebesgue measure zero. 

We outline the main steps of the proof, with details to follow in Section 3.3. 
First, as notational convention, we introduce a parameter r = cr -|- i, so that 
© = H'”“^/^(C') = {9: \9k\ < Ck~'^}. With these conventions, (30) becomes 

(36) i?p(0,e) = 5^C2fe-2"Aei:=^mfc(e). 

fc >0 fc >0 

First, we use the continued fraction approximation to a: Pn/Qn, n = 0,1,2,..., 
and for frequencies near g„, split the sum into blocks of length g„. Thus, 

(37) ^mfc(e)= ^ ^ mfc(e), 

k>0 blocks kG block 

where J2biocks is the sum over all blocks as n varies, the blocks being of 
length Qn between qn and Qn+i- We then apply the equidistribution lemma 
to the sum within blocks. The block sums are then collected into one of 
three zones: 

(38) Rp{e, e) = J2 mk{e) = V{e) + M{e) + B{e). 

k 

These zones (variance, mixed and bias) are illustrated in Figure 2, and de¬ 
fined formally at (45). 
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3.2. Frequency partitions determined by an irrational. Any irrational 
number a defines a unique sequence of convergents: pnjqn]^ = Qo < Qi < 
■ ■ ■ < qn < qn+i < ■ ■ ■ ■ Define > 1 as the largest integer strictly less than 
Q'n+i/gn, thus, 

InQn Qn +1 ^ (^n l)9n' 

Consider a nonuniform grid 

■ ■ ■ j Qni‘^Qm ■ ■ ■ jlnQni Qn+lT‘^Qn+lj ■ ■ ■ jln+lQn+lj ^n+2i ■ ■ ■ • 

Introduce indices 12 = = 1,... ,1^', n = 1,2, _The bivariate indices 

12 = {n,l) are totally ordered by lexicographic ordering and we refer to their 
components by the functions n{v), l{i 2 ). Furthermore, each index n has an 
immediate successor, which in slight abuse of notation we denote by u + 1. 
So our grid is 

(39) = 



Fig. 2. An illustration of the variance-mixed-bias zones. Using a log-scale along the ver¬ 
tical axis, the plot shows both functions fc —> (oscillating dotted curve) and k —> 

(smooth dashed curve), with a = 2/(\/5 +1), e = 10~®, C = 1 and r = 2, which corresponds 
to a — 3/2. Solid vertical lines indicate the borders of the key zones. The thick solid line 
plots k —> mfe(e) = C^k~^'^ A e|. 
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this grid defines a partition of N+ by blocks which between and qn+i have 
length < qn. 


(40) 


n+ = \jB„ B, = [N,,N,+i). 

V 


Clearly, 


B^\=N^+i-N^ 


unless Z(z/) 

^ [ 1 ? Q.n{u))'i ^n{u)' 


To simplify certain calculations we use blocks of length qn[u) only, introduc¬ 
ing 


(41) Cl, — [N^, Ny + qn[u)\ D By. 

By construction, for a given integer k, there are at most two Cy such that 
k € Cy. Hence, summing over all Cy in place of By will only affect the rate 
by a multiplicative constant of at most 2. 


3.3. Proof of Theorem 1. 

3.3.1. Key zones and bounds. First, recall that mfc(e) is defined at (36) 
and use bounds (31); by construction qn(y) < Ny so that for A; in a block 
[Ny,Ny -k qn{u)], Ky<k< 2Ny, hence, 

(42) mfc(e) x C\-^^ A A - hNi\\ka\\). 

We suppress the index iz when not necessary. From the equidistribution 
lemma, 

(43) kN{\\ka\\) <c'^hN(-'] +chN 

/i=4 k£Cy At=l 

To estimate these sums, we use an easily verihed bound. 



Lemma 2. If q>2r and k> 0, then 



where the constants needed for x depend only on r. 

Now apply this to h]sr{x) = A e^N‘^x~'^. Writing also e 

obtain 

Q 

(44) ^ /iTv (/i/g) X C^N-^^ min Ar2(i+r) ^2 ^ ^^i+r_ 

{j,=r 


ejC, we 
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We can now formally define the zone to which a block Biy (or C^) belongs 
in terms of the value of . Again suppressing the subscript we 


say 

( Variance zone 

4A < 1, 

(45) 

G < Mixed zone 

1 < eA^^+^g < g. 


[ Bias zone 4A 

eV^+'^g > g. 


Thus, the zone describes which term appears in the minimizer in (44). Let 
uq < vi be the last indices for which ^ 1 < 1, respec¬ 

tively, and set 

(46) A:o(e) = Wo+i and ki{e) = 

Frequencies k < ko lie in the variance zone, those with ko <k < ki in the 
mixed zone, and those with k>ki in the bias zone. 

Consider now the second term in the upper bound of (43): 

hN{l/{2q')) = C2iV-2^(l A {2eN^+^q'f). 

If eN^^'^q > 1, then, of course, so is eN^^'^q' and so /iAr(l/(2q')) = 
C‘^eN^~'^q can be ignored in comparison with (44). On the other hand, 
if eN^^'^q < 1, then hisf{l/{2q')) < 4e^A^^(g'')^ and this bound dominates 
e^N‘^(f‘. In summary, we have derived the following key bounds: 

{ < ce^N‘^{q')‘^, V G (variance zone), 

X CeN^~'^q, v G (mixed zone), 

X C‘^N~‘^'^q, V G (bias zone). 

The variance zone. Consider first values k < A:o(e) such that the contri¬ 
bution to the minimax risk is due to oscillations occasioned by Diophantine 
approximation only. Here the first bound of (47) applies and the hyperrect¬ 
angle constraint k has not yet any smoothing effect. 

We first derive an expression for feo in terms of e. If = z^o + Ij we have 
by definition, 

so ko > 

On the other hand, again by definition, e~^ > A'Jp*'^(7n(^'o) — 'I'n^o)' Writing 
Ln for qn+ilqn, we obtain 

^0 A^i^o+l — Qn{uo)+l — kjn{uQ)Qn{vQ) — kjn{uQ)^ 

For BA a, < c, while for almost all a and all large n, (28) shows that 
Ln < (loggn)^"’"'^- To summarize, 

ko X (C/e)^/(2+^) for BA a, 

A:o<c(C/e)i/(2+0((iogC/e))i+'5 


for almost all a. 
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First, sum over blocks using partition (40) and apply bound (47) in the 
variance zone: 

fco—1 

(48) F(e) = J2 ^k{e) = J2 ^k{e)<ce^ Y. 

k=l kGBi, ^^1^0 

Using grid (39), and setting vq = {no,l),l < Ino, we obtain 

TlQ Iji TLq TIq 

(49) V (e) <ce^YY < ce^ Y qI qI+i < ce^^no Y 

72=1 1 = 1 72=1 72=1 

where we have set L^g = max{Ln,n < reo}- 

The denominators qn grow at least exponentially [cf. (25)] and so using 
Quo < we hnd 


e^Y^n< ce^Qno ^ ce\Cle)^/^^+^'> = 


72=1 


In the BA case, Ln^ < c, while for almost all a we have L„g < (loggrio)^^”^^^ — 
c(loge)^'’''^/®. In summary. 


(50) 


^(J 2 {l-r)^ 2 r^ for BA O, 

c(log(C'/e))^’'"'^C'^(^“'')e^'', for almost all a. 


The mixed zone. We are now interested in indices k € [fcg, fei) where both 
oscillations and the hyperrectangle constraint contribute to the min¬ 

imax risk; it ends where the oscillations stop. By definition, ki = Wi-i-i 
satishes Wi < < Wi+i- Since always W+i < 2W, it follows that 

A:ixe-i/(i+’‘) = (C7/e)i/(^+"). 

Using bound (47) in the mixed zone, together with jCi/j = qn{u): and N < 
k < 2N yields 

^ mfc(e)xCeiV3-"g„(,)xCe ^ ^ Ce ^ k^-^, 

keCu k&Ci, k&Cv 

which shows that for sums over blocks of length in the mixed zone, we 
may replace mfc(e) by ekl~'^. Since the blocks Cy form a cover of the integers 
ko,... ,ki — 1 of redundancy at most two, 

All —1 ki — 1 

M{e) = Y "ifc(e) -CeY 

k=ko k=ko 

Thus, in the mixed zone, 

rC'€fc2-^xC2(i-’')e2^ ifT>2, 

(51) M(e) X < C'elog(/ci//co) xC'elog(C'/e), if r = 2, 

[CekY ifi<r<2. 
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The bias zone. Note that for k > ki, since always ||A:a|| < 1, we have 
e^A:^/||/cop > > C‘^k~‘^'^ and so there is no longer any effect of oscilla¬ 

tion, and rukie) = C‘^k~‘^'^ in (36). Hence, 

(52) B{e) = Y, mfc(e) = C^Y ^ x (j 2 {i-f)^ 2 f^ 

k>ki k>ki 

We emphasize that bounds (51) and (52) apply to all irrationals a. 

3.3.2. Summary. We return to (38). In the BA case (and also the a.a. 
case when ^ < r < 2), it is apparent from (50), (51) and (52) that H H -|- 
M X M, which establishes (34). 

It remains to consider the a.a. case with r > 2. The upper bound in (35) is 
apparent from (50). For the lower bound, let a be an arbitrary irrational with 
convergents Pk/qk, k = 0,1,2,.... Simply by choosing 6 to be zero except in 
the A:th coordinate—in which 9k = Ck ~'^—we obtain the elementary lower 
bound 

(53) i?p(0, e) > sup A e|. 

k 

Since > eA;/||A:a||, we find using (22) that for k = Qn, 

C^k-^^ Ael>C^q-^^ AeVnql+i- 

Using (27) in (53), we deduce that for almost all a there exists a sequence 
ni such that 

(54) Rp{e,e)>supC‘^q~^^ Ae^qf,^{logqnif. 

Construct a sequence (e[/]),/ = 1,2,..., with 

(55) = e[/]2g^/loggn^)^ which gives q^ x is[l] loge[Z]“^)"^/(2+^\ 

and using such an e[^]-sequence in (54), together with (55), yields the re¬ 
quired bound 

Rp{e,e[l]) > X {\og{C/e[l])tC^^^-^h[lfR 

4. Ellipsoids. For an ellipsoid 0 = 0'^(C') defined as in (13), let r = 
a /{a + 2). The goal of this section is to establish the following: 

Theorem 2. For a > 0 and BA a, we have 

ii7v(0‘"(Cl),e)xC2(i-^)e2U 

For almost all a, bounds (35) hold for {Q^ (C), e) with r replaced by r, 

for all a > 0. 

Since f = cr/(cj-|-2), the DIP a{Ka, 0°'(C')) = | for all ellipsoids, regardless 
of the value of the smoothness index a. 
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Upper bound. As with hyperrectangles, the aim is to use sums over blocks 
of length ~ q. To do so, we define slightly larger ellipsoids based on the 
partition {B^y} of (40): 


(56) 


0, = 0-(C)= 0 




et<c^ 


/ . "fc — 


where the index a indicates that the grid depends on number theoretical 
properties of a. By definition (40) of the partition, k £ implies that 
k > Ny so that 0 C 0a and, hence, i?(0,e) < R{Qa,e). 

We may now split the optimization across and within blocks: 


(57) 


Rp{Qa,e) =sup 


= sup 


{E E OlAel.OGea] 

I I' keB^ ) 

K u u 


where the optimization within block Bi, is subject to the quota 


(58) e) = sup i ^ 0l Ael: ^ 9l<tl \ = min i ^ ef 


keBt, 


k&Bjy 


k^B^ 


The equidistribution lemma can be applied to this last sum: X) 
X) dropping the subscript n, we obtain 


E 




1 


Q 2 

qZ 


||A:o| 




Hence, from (57) and (58), 


M=1 


(59) RpiQa, e) < csup I E ““ ■ E • 


Observe that for any positive sequences {u^), (C[/) and (d^), with d^, nonde¬ 
creasing, 

(60) sup E c^.): E ^ ^ ^ E 

^ \ U V ) V<Uq 


^ ( Cudu ^ 1 . 
u<uo 


for any value vq for which 
(61) 
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Applying this to (59) with and d,y = we 

obtain 

(62) Rp{Qa,e) < ce^ ^ NuQI{u)+i- 

l/<Uo 

Here Cydy = e"^{q')^ = e"^{lqn)^'^~^^q^+i if p = {n, 1). Let Mn = {v-qn< 

Nu < qn+i} and note, since {In + f)qn > Qn+i, that 

In 

CDn-.= Y. C.d, = e2g2-+2g2^^g/2.+2 
V^J^n 1 

> ce\ln + > Ce2/„g2a+4_ 

Let no be the first index n for which CDn > 1: since CDn^-i < 1, we have 

(63) < l/(c^no-i) and so qn^ < 

Since (62), together with (63), is exactly the situation reached at (48) in 
the hyperrrectangle case (with r replaced by a) we conclude that the bounds 
(50) apply (with r replaced by f). 

Lower bound. Arguing exactly as at (53), but with r replaced by cr, 

(64) Rp{&, e) > sup A e^qlqY^. 

n 

In the BA case, let no be the last index n for which e^q^ < C‘^q~'^^, so that 
< e“^ and q~^^ > e2o-/(o-+2)^ From (64) at n = no + 1, we hnd 

Rp{e, e) > > cC-2g-f > cC2(i-^")e2L 

For the almost all case, the argument is the same as before at (54) and 
below. 

5. Discussion. 

5.1. Periodic vs. nonperiodic. Recent papers by Hall, Ruymgaart, van Gaans and van Rooij 
(2001) and Groeneboom and Jongbloed (2003) consider in part a density 
estimation version of the deconvolution problem in which the data con¬ 
sist of an i.i.d. sample Yi = Xi + Zi in which Xi are i.i.d. with unknown 
density / and Zi are i.i.d. uniform on [—a,a] and independent of the X^. 

Groeneboom and Jongbloed (2003) derive pointwise limiting distributions 
of estimators of / based on kernel smooths of nonparametric MLEs of the dis¬ 
tribution function of /. The work of Hall, Ruymgaart, van Gaans and van Rooij 
(2001) looks at maximum global estimation errors, and so is perhaps closer 
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in spirit to the present investigation. Instead of any periodicity assump¬ 
tions, it is assumed there that the density / has compact support on M. The 
compact support permits an explicit inversion formula: if 5 = Kaf and I is 
chosen large enough that x — la < inf supp/, then 

I 

f{x) = 2a^g'{x -ia). 
i=l 

In this case Hall, Ruymgaart, van Gaans and van Rooij (2001) show that 
the DIP a{Ka,T^) = 1 for of both hyperrectangle and ellipsoid type, 
in contrast to the results found for the periodic model considered here. The 
difference in results may perhaps be understood by observing that sinusoids, 
which are basic to the periodic model, do not have compact support. Thus, 
the models capture genuinely different phenomena. 


5.2. Effect of rational approximations to a. In practice, computer code 
works with rational numbers—what effect will this have on our conclusions? 
A few remarks can be made even without getting into specifics of particular 
models of computation or attempting a full analysis. 

A basic issue is whether the boxcar width a is under the investigator’s 
control. If it is—our first scenario—then we might imagine replacing a by 
= Pm/Qm, say, SO that model (4) becomes 


(65) 


yk = rk{am)0k + ezk, 


rk{am) 


sin vr/cam 
vr/cam 


Here PmiQm might be one of the sequence of best rational approximations 
to a. The approximation results of Section 2.2 show that our analysis of 
estimation in model (65) is unchanged from that of irrational a, at least for 
frequencies k < Qm, since a and am will have the same convergents Pr/qr 
for r <m. Thus, one could simply choose qm large enough that the tail bias 
accruing to frequencies above qm is negligible. To be more specific, assume 
that 0 is a hyperrectangle and that e is known. Let p > 0 he small 

[we could let 77 (e) —> 0 with e to preserve rates of convergence]. We can choose 
/c 2 > ki{e) [defined at (46)] so that the tail bias 


Y, <r]R{H‘^{C),e), 

k>k2 


and then choose m large enough that qm^k 2 - A minimax estimator for 
H^{C) under model (65) will be essentially identical in structure with one 
for the original irrational a, since in either case, the zero estimator is used 
at all frequencies k> qm- 

In the second scenario, the boxcar width a is determined by nature and the 
investigator must work with the data y from model (4) . We still assume that 
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the value of a is known, but must use rational approximations to a in onr 
estimators based on y. For definiteness, consider again the case 0 = H^{C) 
and set Tk = Ck~'^. Consider the risk of linear rules Ok{y) = Ckyk if Cfe < Tk 
and 6 k{y) = 0 otherwise. If S = {k: ek < Tk}, then the risk of such a rule is 


r{c, 0) = Y^ [cle^ + (1 - CkVkfel] + ^ 91 

kes kis 

Suppose that a is irrational: with infinite precision, we could use an 
estimator Ck = l/r^ that makes r{c,6) = '^9‘f,/\ e|. Now consider the dif¬ 
ference in risk that results from an approximation Ck = f/fk, where fk = 
(sin7rA:a)/(7r/co) for some rational approximation a = pm/qm to a. 


r c 


,e)-r{c,d) = J2 

s 


l\rk 




Gif, 


if we write rk/vk = 1 + 6k, and assume that 6 = sup;i,g 5 |5fc| < 1, 


(66) sup |r(c, 0) — r(c, 0)1 < 3(iiip(0, e)-|-(5^ t|. 

0 


Using a derivative bound on a—> sinyr/co and then (7), 


141 < - 

a 


sinvrfea 


sinvr/ca 


- 1 


+ 


a 

- - 1 
a 


< 


a — a 


nk 


sinvrfca 


+ 1 < 


2|a —a| k 


If a = pm/qm and k < qr, then from (26), (23) and (25), 


141 < 


qr 


a \ qm 


2 o 

“ a 


Consequently, the risk difference due to using a rational approximation a can 
be made as small as desired by first selecting r so that sup{/c: k £ 5(e)} < qr 
and then m so that the bound on 4 and, hence, 6 is as small as needed. 


5.3. Generalizations. 1. It seems likely that estimators which are adap¬ 
tive with respect to a and C could be constructed (for a fixed irrational 
a) by grouping frequencies k within a given block [qn,qn+i) into a number 
of subblocks according to the value of ||A:a|| and then using some form of 
James-Stein shrinkage within each subblock. This methodology is now quite 
well established on other inverse problems with monotone eigenvalues; see, 
for example. Cavalier and Tsybakov (2002). Alternatively, adaptivity (up 
to logarithmic terms) is established via a wavelet deconvolution approach in 
Johnstone, Kerkyacharian, Picard and Raimondo (2004) for a class of Besov 
spaces including ellipsoids (13). 

2. The ellipsoid results might also have been derived using the explicit 
evaluation of minimax risk given by Pinsker (1980). However, the method 
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used here allows extension of the rate results to weighted l 2 r bodies of the 
form 0 = {0: ^ for r > 1 using essentially the same argument 

as for ellipsoids. For example, the analog of (58) states that if the ordered 
increasing e(fc) corresponding to indices within a block By satisfy some bound 
e(fc_|_i)/e(fc) < 7 (as happens for the boxcar Ka), then 


K{ty,e) =sup 


k^Bi/ k^Bi/ 



where Iq = sup{Z: ^fj) ^ and such sums can be estimated by the 

methods of this paper. 

3. It is straightforward to extend the results of this paper to iterated ker¬ 
nels Ka = I[_a^a])*"^ with eigenvalues = {simrka)^/{Trka)"^. How¬ 
ever, kernels of the form Kafi = { 2 a)~^Ii^_a^a] * ;,] have eigenvalues 

sinvr/ca sin7rA:6 || A:a|| || A:6|| 

^ Trka Trkb k'^ab ’ 

while the linear motion kernel (6) has 

sin7r(/cia -|- k2ra) 

T:{kia + k2ra) 

Considerable work exists on simultaneous Diophantine approximation prob¬ 
lems [Schmidt (1980), Chapter 2], but whether this enables rate of conver¬ 
gence calculations is an open question. 


APPENDIX 


Proof of (27) and (28). We recall the convergence/divergence the¬ 
orem of Khinchin [(1992), Theorem 32]. Let ^(x) be a positive continuous 
function of x > 0, such that x'ip{x) is nonincreasing. Then the inequality 
11 go 11 <'ip{q) has, for almost all a, a finite or infinite number of solutions in 
positive integers q according as '4’{x) dx converges or diverges. 

For (27), consider ^|J{x) = (2xlogx)“^. Since the integral diverges, let q 
be one of the infinitely many solutions to ||ga|| <'ip{q) and choose n so that 
qn<q< qn+i- it then follows from (22) and the property stated after (21) 
that 

< hnaW < ||ga|| < ^ — < -— ^ -, 

2g„+i 2glogg 2g„logg„ 

from which (27) is immediate. 

For (28), consider 'ip{x) = x“^(logx)“^“^. Since the integral converges, for 

all q > q{a,6), we have ||ga|| > 'ip{q). In particular, from (22), for large n, 

1 „ „ 1 

-- - —n -u+7’ 

qn+l gn(loggn) + 
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from which we obtain (28). □ 

Proof of (29). The method used to establish (29) for direct data 
may be extended in a straightforward manner to model (9), for example, 
by stepping through the arguments in Johnstone [(2003), Hyperrectangles 
chapter]. The key step in this approach, as in Donoho, Liu and MacGibbon 
(1990), is to establish that 

(67) i2L(0,e) = supi?L(0(T),e), 

tG© 

where 0(t) is the hyperrectangle Ii[—Ti,Ti]. This can be reduced to the 
Kneser-Kuhn minimax theorem [Johnstone (2003), Corollary A.4] applied 
to payoff function 

( 68 ) f{c,s) = Yy^cl + {l-Ckfsk], 

k 

dehned for (c, s) G £ 2 (N) x £i(N). But result (67) extends immediately to 
model (9) by replacing with e| in (68) and changing the domain of c to 
the weighted Hilbert space £ 2 (^ 1 , (^1)) = applying the 

minimax theorem in the same way. □ 
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